School Enrolment TrendsΒΆ

Authored by: Pooya Forghani Arani

Duration: 60 minutes

Level: Intermediate

Pre-requisite Skills: Python, Pandas, Matplotlib, NumbPy, Seaborn, Folium

ScenarioΒΆ

As a government planner, I want to analyse school enrolment trends across Melbourne so that we can plan future schools, allocate resources fairly, and respond to changing population needs in each region more effectively.

What this use case will teach youΒΆ

  • How to explore and understand real-world school data from Victoria
  • How to combine enrolment and location data for analysis
  • How to find trends in student numbers over time
  • How to use maps and charts to show insights clearly
  • How to support planning with data-driven decisions

Background and IntroductionΒΆ

Understanding school enrolment trends is very important for planning and decision-making in education. In a growing city like Melbourne, the number of students in each area can change quickly due to population growth, new housing developments, or changes in family needs. If enrolment increases in one region, more teachers, classrooms, or even new schools may be needed. On the other hand, if enrolments drop, resources may need to be adjusted. This use case looks at school enrolment data from 2020 to 2024, combined with school location data, to help us see how enrolment has changed over time and across different parts of the city. It includes details for each school such as type (primary, secondary), education sector (government, independent, or Catholic), and location. By analysing this data, we can find out which areas are growing, which sectors are changing, and where the most pressure on school infrastructure might be. This helps government planners, education leaders, and local councils make better choices about funding, resources, and new school developments. It also provides useful insights for parents and communities who want to understand the education landscape in their area. This project uses data analysis and visual tools to make these insights clear.

Datasets usedΒΆ

School Locations:

The school locations API provides detailed information about where schools are located across Victoria. It includes each school's name, type (such as primary or secondary), education sector, and full address. It also contains geographic coordinates, which allow schools to be shown on a map. This dataset helps with understanding school distribution and planning services based on location and region.

  • School Locations

School Enrolments:

The school enrolment datasets from 2020 to 2024 show how many students were enrolled in each school across Victoria each year. They include enrolment numbers by year level, school type, and education sector such as Government, Catholic, and Independent. This information helps track changes over time and supports better decisions for school planning, funding, and resource allocation.

  • All Schools FTE Enrolments 2024
  • All Schools FTE Enrolments 2023
  • All Schools FTE Enrolments 2022
  • All Schools FTE Enrolments 2021
  • All Schools FTE Enrolments 2020

Importing DatasetsΒΆ

In this section, the code loads school location and enrolment data from 2020 to 2024. It cleans and combines multiple CSV files, standardises column names, handles gender-based data in 2020, and prepares a single dataset for further analysis and visualisation using Python libraries.

InΒ [1]:
import requests
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import folium
from IPython.display import display, clear_output
import ipywidgets as widgets
import matplotlib.ticker as ticker
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import warnings; warnings.filterwarnings("ignore")

# for better display
pd.set_option('display.max_columns', None)

# JSON API endpoint for school locations
json_url = 'https://discover.data.vic.gov.au/api/3/action/datastore_search?resource_id=d26bf015-a1e5-48dd-a1d6-8edd4b0a511b&limit=5000'

# Fetch JSON response
response = requests.get(json_url)
data = response.json()

# Extract records from JSON
records = data['result']['records']

# Convert to DataFrame
location_df = pd.DataFrame.from_records(records)

# Display basic info
print("Location dataset loaded:", location_df.shape)

# List of enrolment CSV URLs
enrolment_urls = [
    "https://www.education.vic.gov.au/Documents/about/research/datavic/dv377_DataVic-AllSchoolsEnrolments-2024.csv",
    "https://www.education.vic.gov.au/Documents/about/research/datavic/dv355-VIC%20All%20Schools%20Enrolments%202023.csv",
    "https://www.education.vic.gov.au/Documents/about/research/datavic/dv335-allschoolsFTEenrolmentsFeb2022.csv",
    "https://www.education.vic.gov.au/Documents/about/research/datavic/dv316-allschoolsFTEenrolmentsFeb2021.csv",
    "https://www.education.vic.gov.au/Documents/about/research/datavic/dv300-allschoolsFTEenrolmentsFeb2020.csv"
]

# Define the simplified final column structure (for consistency)
target_columns = [
    'Education_Sector', 'Entity_Type', 'School_No', 'School_Name', 'School_Type', 'School_Status',
    'Prep Total', 'Year 1 Total', 'Year 2 Total', 'Year 3 Total', 'Year 4 Total', 'Year 5 Total', 'Year 6 Total',
    'Primary Ungraded Total', 'Primary Total',
    'Year 7 Total', 'Year 8 Total', 'Year 9 Total', 'Year 10 Total', 'Year 11 Total', 'Year 12 Total',
    'Secondary Ungraded Total', 'Secondary Total', 'Grand Total', 'Year', 'CENSUS_TYPE'
]

enrolment_dfs = []

for url in enrolment_urls:
    print("Processing enrolment data from:", url)
    df = pd.read_csv(url, encoding='cp1252')
    
    # Clean column names
    df.columns = df.columns.str.strip().str.replace('"', '', regex=False)
    
    if 'Prep Males Total' in df.columns:
        # It's the 2020 dataset (has male/female columns)

        # Combine male + female columns for total columns
        def combine_cols(df, male_col, female_col, new_col):
            male_series = df[male_col] if male_col in df.columns else pd.Series(0, index=df.index)
            female_series = df[female_col] if female_col in df.columns else pd.Series(0, index=df.index)
            df[new_col] = pd.to_numeric(male_series, errors='coerce').fillna(0) + pd.to_numeric(female_series, errors='coerce').fillna(0)


        # Year-wise aggregation
        years = ['Prep', 'Year 1', 'Year 2', 'Year 3', 'Year 4', 'Year 5', 'Year 6',
                 'Primary Ungraded', 'Year 7', 'Year 8', 'Year 9', 'Year 10', 'Year 11', 'Year 12', 'Secondary Ungraded']
        
        for y in years:
            combine_cols(df, f'{y} Males Total', f'{y} Females Total', f'{y} Total')

        # Keep only the target columns if they exist
        keep_cols = [col for col in target_columns if col in df.columns]
        df = df[keep_cols]

    else:
        # Other years - already in correct format, just align column names
        df.columns = df.columns.str.replace(' ', ' ', regex=False)  # Ensure consistent spacing
        df = df[[col for col in df.columns if col in target_columns]]  # Keep only expected columns

    # Append cleaned DataFrame
    enrolment_dfs.append(df)

# Combine all years
enrolment_df = pd.concat(enrolment_dfs, ignore_index=True)

# Final cleanup
enrolment_df.columns = enrolment_df.columns.str.strip().str.replace(' ', '_')
print("Combined enrolment dataset shape:", enrolment_df.shape)
Location dataset loaded: (2301, 25)
Processing enrolment data from: https://www.education.vic.gov.au/Documents/about/research/datavic/dv377_DataVic-AllSchoolsEnrolments-2024.csv
Processing enrolment data from: https://www.education.vic.gov.au/Documents/about/research/datavic/dv355-VIC%20All%20Schools%20Enrolments%202023.csv
Processing enrolment data from: https://www.education.vic.gov.au/Documents/about/research/datavic/dv335-allschoolsFTEenrolmentsFeb2022.csv
Processing enrolment data from: https://www.education.vic.gov.au/Documents/about/research/datavic/dv316-allschoolsFTEenrolmentsFeb2021.csv
Processing enrolment data from: https://www.education.vic.gov.au/Documents/about/research/datavic/dv300-allschoolsFTEenrolmentsFeb2020.csv
Combined enrolment dataset shape: (11410, 26)

Data Cleaning & PreparationΒΆ

In this section, the code prepares the data for merging by converting school numbers and coordinates to the correct data types. It then filters school locations to include only those in Melbourne and merges them with enrolment data using school names, resulting in a combined dataset with both enrolment and location details.

InΒ [2]:
# Ensure School_No is the same type in both DataFrames
enrolment_df['School_No'] = enrolment_df['School_No'].astype(str)
location_df['School_No'] = location_df['School_No'].astype(str)

location_df['X'] = location_df['X'].astype(float)
location_df['Y'] = location_df['Y'].astype(float)

# Filter location_df to only include rows with 'Melbourne' in Address_Town
melbourne_schools = location_df[
    (location_df['Address_Postcode'].astype(str).str.isnumeric()) & 
    (location_df['Address_Postcode'].astype(int).between(3000, 3207))
]

# Now merge only with filtered Melbourne schools
merged_df = pd.merge(enrolment_df, melbourne_schools[['School_Name', 'X', 'Y', 'LGA_Name']], on='School_Name', how='inner')

# Inspect merged data
merged_df
Out[2]:
Education_Sector Entity_Type School_No School_Name School_Type School_Status Prep_Total Year_1_Total Year_2_Total Year_3_Total Year_4_Total Year_5_Total Year_6_Total Primary_Ungraded_Total Primary_Total Year_7_Total Year_8_Total Year_9_Total Year_10_Total Year_11_Total Year_12_Total Secondary_Ungraded_Total Secondary_Total Grand_Total Year CENSUS_TYPE X Y LGA_Name
0 Catholic 2 20 Parade College Secondary O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 360.0 329.0 338.0 311.0 304.0 304.0 0.0 1946.0 1946.0 2024 F 145.066978 -37.690178 Banyule (C)
1 Catholic 2 25 Simonds Catholic College Secondary O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 52.0 71.0 64.0 82.0 65.0 66.0 0.0 400.0 400.0 2024 F 144.952883 -37.805971 Melbourne (C)
2 Catholic 2 26 St Mary’s College Melbourne Secondary O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 57.0 71.0 85.0 85.0 76.0 100.0 0.0 474.0 474.0 2024 F 144.997001 -37.859365 Port Phillip (C)
3 Catholic 2 29 St Patrick's School Primary O 29.0 27.0 35.0 35.0 43.0 36.0 39.0 0.0 244.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 244.0 2024 F 145.352614 -37.752735 Yarra Ranges (S)
4 Catholic 2 29 St Patrick's School Primary O 29.0 27.0 35.0 35.0 43.0 36.0 39.0 0.0 244.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 244.0 2024 F 145.070159 -37.983569 Kingston (C)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9840 Independent 2 2112 Brotherhood of St Laurence|David Scott School Special O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 42.0 35.0 0.0 0.0 104.0 104.0 2020 F 145.119248 -38.148247 Frankston (C)
9841 Independent 2 2113 Hester Hornbrook Academy Special O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 84.2 75.0 0.0 0.0 186.0 186.0 2020 F 144.956150 -37.816544 Port Phillip (C)
9842 Independent 2 2114 SEDA College (Victoria) Secondary O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 421.0 0.0 0.0 799.0 799.0 2020 F 145.054006 -37.830714 Boroondara (C)
9843 Independent 2 2116 Cheder Levi Yitzchok Inc Pri/Sec O 27.0 8.0 0.0 15.0 0.0 12.0 18.0 0.0 90.0 15.0 0.0 0.0 0.0 0.0 0.0 0.0 15.0 105.0 2020 F 144.991910 -37.860567 Port Phillip (C)
9844 Independent 2 2227 Fitra Community School Primary O 10.0 3.0 3.0 1.0 0.0 2.0 0.0 0.0 21.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 21.0 2020 F 145.139319 -37.944839 Kingston (C)

9845 rows Γ— 29 columns

Descriptive AnalysisΒΆ

Total Number of Schools by Sector and TypeΒΆ

The code groups the merged data by education sector and school type, then counts the number of unique schools in each group. It creates a bar chart showing how many schools belong to each type and sector, adds count labels on bars, and improves the chart’s style and readability.

InΒ [Β ]:
# Group and count unique schools by Education_Sector and School_Type
school_counts = merged_df.groupby(['Education_Sector', 'School_Type'])['School_No'].nunique().reset_index(name='School_Count')

# Sort data
school_counts = school_counts.sort_values(by='School_Count', ascending=False)

# Plot setup
plt.figure(figsize=(12, 6))
barplot = sns.barplot(
    data=school_counts,
    x='Education_Sector',
    y='School_Count',
    hue='School_Type',
    palette='Set2'
)

# Add count labels on each bar
for container in barplot.containers:
    barplot.bar_label(container, fmt='%.0f', label_type='edge', fontsize=10)

# Title and axis labels
plt.title('Number of Schools by Sector and Type', fontsize=16)
plt.xlabel('Education Sector', fontsize=12)
plt.ylabel('Number of Schools', fontsize=12)
plt.xticks(rotation=30)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.legend(title='School Type', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()
No description has been provided for this image

Government schools make up the largest group, especially Primary schools with 472. Catholic schools also have a high number of Primary schools (323), while Independent schools have more Pri/Sec combined schools. Special and Language schools are much fewer overall. This visual helps us understand the structure of Victoria’s education system and how different types of schools are distributed across sectors. It shows that the Government sector plays the largest role in providing diverse school types.

Average Enrolment Over TimeΒΆ

This code calculates the average enrolment per school type for each year from 2020 to 2024. It then creates a line chart to show trends over time for each school type. The plot includes clear labels, markers, a formatted y-axis, and a legend to compare enrolment across school types.

InΒ [Β ]:
# Group by Year and School_Type, then calculate average enrolment
avg_by_type = merged_df.groupby(['Year', 'School_Type'])['Grand_Total'].mean().reset_index()

# Ensure Year is integer for proper x-axis ticks
avg_by_type['Year'] = avg_by_type['Year'].astype(int)

# Plot
plt.figure(figsize=(12, 6))
sns.lineplot(data=avg_by_type, x='Year', y='Grand_Total', hue='School_Type', marker='o', linewidth=2.5)

# Titles and labels
plt.title('Average Enrolment per School by Type (2020-2024)', fontsize=16)
plt.xlabel('Year')
plt.ylabel('Average Enrolment')
plt.grid(True, linestyle='--', alpha=0.5)

# Format Y-axis
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{int(x):,}"))

# Fix X-axis ticks
plt.xticks(avg_by_type['Year'].unique())

plt.legend(title='School Type')
plt.tight_layout()
plt.show()
No description has been provided for this image

Pri/Sec schools have the highest average enrolment, followed by Secondary schools. Primary and Special schools have lower average enrolments and stayed mostly stable. Language schools saw a dip in 2021 but then rose again. These trends help us see how school sizes vary over time depending on their type. Some types grow faster than others, possibly due to demand, location, or program offerings. This chart supports planning by showing which school types tend to handle larger student numbers each year.

Tracking Year-Level Growth and Total Enrolment TrendsΒΆ

This section visualises enrolment trends across multiple school years and grade levels.

Grade-Level Yearly Trends:

The first line graph presents enrolment numbers for each individual year level (from Year 1 to Year 12) across the observed years. This breakdown helps identify patterns like which grades are growing or shrinking over time, making it easier to understand population movements across the education system. Each line represents a grade, giving a clear picture of how student numbers evolve as they progress through school.

Primary, Secondary, and Overall Growth:

The second graph focuses on total enrolments in Primary, Secondary, and combined (Grand Total) categories. It includes helpful K-format annotations (e.g., "↑ 2.3K") showing the change from the previous year. These are color-coded: green for increases, red for decreases, and arrows clearly indicate the direction of change. This visualisation makes it easy to track yearly fluctuations and overall growth at a glance.

InΒ [5]:
# Define grade-level columns
grade_cols = [
    'Year_1_Total', 'Year_2_Total', 'Year_3_Total',
    'Year_4_Total', 'Year_5_Total', 'Year_6_Total',
    'Year_7_Total', 'Year_8_Total', 'Year_9_Total',
    'Year_10_Total', 'Year_11_Total', 'Year_12_Total'
]

# Prepare the data
enrolment_trend_df = merged_df[['Year'] + grade_cols].dropna()
grade_trends = enrolment_trend_df.groupby('Year')[grade_cols].sum()

# Plot
plt.figure(figsize=(10, 5))
for grade in grade_cols:
    plt.plot(grade_trends.index, grade_trends[grade], marker='o', label=grade)

plt.title("Enrolment Trends by Grade Level", fontsize=12)
plt.xlabel("Year", fontsize=12)
plt.ylabel("Total Enrolments", fontsize=12)

# x-axis shows each year
plt.xticks(ticks=grade_trends.index, rotation=45)

plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title="Grade Level", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()
No description has been provided for this image

Most grade levels show steady numbers, especially Year 1 to Year 6, which make up the primary range. Higher years like Year 11 and Year 12 show smaller but growing enrolments. Some fluctuations may reflect population changes or differences in student retention across years. Seeing all year levels together helps us understand how the entire student journey is changing. It supports schools and governments in adjusting staffing and facilities for the future based on demand in specific grade levels.

InΒ [Β ]:
# Define grade-level columns
total_grade_cols = [
    'Primary_Total', 'Secondary_Total', 'Grand_Total'
]

# Prepare the data
enrolment_trend_df = merged_df[['Year'] + total_grade_cols].dropna()
grade_trends = enrolment_trend_df.groupby('Year')[total_grade_cols].sum()

# Plot
plt.figure(figsize=(12, 6))
for grade in total_grade_cols:
    plt.plot(grade_trends.index, grade_trends[grade], marker='o', label=grade)

    # Annotate year-on-year differences in "K" units with arrows
    years = grade_trends.index.to_list()
    values = grade_trends[grade].to_list()
    for i in range(1, len(values)):
        diff = values[i] - values[i-1]
        diff_k = diff / 1000  # Convert to K
        sign = "+" if diff_k >= 0 else ""
        arrow = "↑" if diff_k > 0 else "↓" if diff_k < 0 else "β†’"
        color = 'darkgreen' if diff_k > 0 else 'red' if diff_k < 0 else 'gray'

        plt.annotate(
            f"{arrow} {sign}{abs(diff_k):.1f}K",
            xy=((years[i-1] + years[i]) / 2, (values[i] + values[i-1]) / 2),
            textcoords="offset points",
            xytext=(0, 8),
            ha='center',
            fontsize=9,
            color=color
        )

plt.title("Total Enrolment Trends by Grade Level", fontsize=14)
plt.xlabel("Year", fontsize=12)
plt.ylabel("Total Enrolments", fontsize=12)

# X-axis ticks
plt.xticks(ticks=grade_trends.index, rotation=45)

plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title="Grade Level", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()
No description has been provided for this image

Enrolments dropped in 2021 and 2022 but picked up again from 2023. Primary enrolments had small drops then a rise, while Secondary enrolments grew strongly in 2023 and 2024. The Grand Total increased most in 2024. Using arrows and β€œK” units makes changes easy to understand. This chart is useful for decision-makers to see overall movement and plan school capacity, teacher hiring, or policy responses accordingly.

Mapping Schools in MelbourneΒΆ

In this part, the code creates an interactive map that shows school locations and enrolment numbers for a selected year. It starts by removing rows with missing coordinates or enrolment values. A dropdown menu lets users pick a year. When a year is selected, the map updates to show only the schools for that year. Each school appears as a circle, and the size of the circle shows how many students are enrolled; bigger bubbles mean more students. When clicked, each bubble shows extra information about the school, like its name, type, sector, and status. A custom legend is also added to the map to show the selected year and the minimum and maximum enrolment values for that year.

InΒ [4]:
geo_df = merged_df.dropna(subset=['X', 'Y', 'Grand_Total'])
year_selector = widgets.Dropdown(
    options=sorted(geo_df['Year'].unique()),
    description='Year:',
    style={'description_width': 'initial'}
)

def draw_map(selected_year):
    clear_output(wait=True)
    display(year_selector)

    # Filter for selected year
    year_df = geo_df[geo_df['Year'] == selected_year]

    # Handle empty case
    if year_df.empty:
        print(f"No data available for year {selected_year}")
        return

    # Calculate bubble scaling
    min_radius = 2
    max_radius = 15
    min_enrol = year_df['Grand_Total'].min()
    max_enrol = year_df['Grand_Total'].max()

    def scale_radius(value):
        return min_radius + (max_radius - min_radius) * (value - min_enrol) / (max_enrol - min_enrol)

    # Create the map
    m = folium.Map(location=[-37.8136, 144.9631], zoom_start=11)

    for _, row in year_df.iterrows():
        radius = scale_radius(row['Grand_Total'])
        
        # Create popup content
        popup_html = f"""
        <div style="width: 150px;">
            <strong>{row['School_Name']}</strong><br>
            <b>Enrolment:</b> {int(row['Grand_Total']):,}<br>
            <b>School Type:</b> {row['School_Type']}<br>
            <b>Sector:</b> {row['Education_Sector']}<br>
            <b>Status:</b> {row['School_Status']}
        </div>
        """

        popup = folium.Popup(popup_html, max_width=300, min_width=250)

        folium.CircleMarker(
            location=(row['Y'], row['X']),
            radius=radius,
            popup=popup,
            color='blue',
            fill=True,
            fill_opacity=0.6
        ).add_to(m)

    # Create custom legend HTML
    legend_html = f'''
    <div style="
        position: fixed;
        top: 100px;
        left: 10px;
        width: 220px;
        height: 140px;
        background-color: white;
        border:2px solid grey;
        z-index:9999;
        font-size:20px;
        padding: 10px;
        box-shadow: 2px 2px 6px rgba(0,0,0,0.3);
        ">
        <b>Enrolment:</b><br>
        Year: {selected_year}<br>
        Min: {int(min_enrol):,} students<br>
        Max: {int(max_enrol):,} students
    </div>
    '''

    m.get_root().html.add_child(folium.Element(legend_html))

    display(m)

year_selector.observe(lambda change: draw_map(change['new']) if change['type'] == 'change' and change['name'] == 'value' else None)
draw_map(year_selector.value)
Dropdown(description='Year:', index=4, options=(np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023)…
Make this Notebook Trusted to load map: File -> Trust Notebook

Trend Breakdown by SectorΒΆ

This code creates a line chart to show total school enrolments from 2020 to 2024, grouped by education sector. It uses different colors and markers for each sector to highlight trends over time. The chart includes a title, labels, legend, and formatted axes, making it easy to compare enrolment changes across sectors during the five-year period.

InΒ [Β ]:
# Group and summarise
sector_trends = merged_df.groupby(['Year', 'Education_Sector'])['Grand_Total'].sum().reset_index()

# Ensure Year is integer
sector_trends['Year'] = sector_trends['Year'].astype(int)

# Plot
plt.figure(figsize=(12, 6))
sns.lineplot(
    data=sector_trends,
    x='Year',
    y='Grand_Total',
    hue='Education_Sector',
    marker='o',
    linewidth=2.5,
    palette='tab10'
)

# Title and labels
plt.title('Total Enrolment by Sector Over Years (2020-2024)', fontsize=16)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Total Enrolment', fontsize=12)

# Grid and y-axis formatting
plt.grid(True, linestyle='--', alpha=0.5)
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{int(x):,}"))
plt.xticks(sector_trends['Year'].unique())  # Ensure correct year ticks

# Legend
plt.legend(title='Education Sector', loc='upper left')

plt.tight_layout()
plt.show()
No description has been provided for this image

Government schools maintain the highest enrolment numbers and continue to grow. Catholic school enrolments slightly declined from 2020 to 2023 and only started rising again in 2024. Independent schools show steady growth across all years. These trends help us understand how families choose schools and which sectors are expanding. The information can help guide funding, policy, and infrastructure investment across sectors, especially if one sector is growing faster than others.

LGA-Level AnalysisΒΆ

This code creates a line chart showing enrolment trends from 2020 to 2024 for the top 15 LGAs in Victoria. It first finds the LGAs with the highest total enrolments, then plots each one with a different line. The chart includes a clear title, labels, gridlines, and formatted numbers, making it easy to compare changes over time.

InΒ [7]:
# Group by Year and LGA
lga_enrol = merged_df.groupby(['Year', 'LGA_Name'])['Grand_Total'].sum().reset_index()

# Get top 10 LGAs based on total enrolment across all years
top_lgas = lga_enrol.groupby('LGA_Name')['Grand_Total'].sum().nlargest(15).index

# Filter data to include only top 10 LGAs
filtered_lga = lga_enrol[lga_enrol['LGA_Name'].isin(top_lgas)]

# Ensure Year is integer for correct plotting
filtered_lga['Year'] = filtered_lga['Year'].astype(int)

# Plot
plt.figure(figsize=(14, 7))
sns.lineplot(
    data=filtered_lga,
    x='Year',
    y='Grand_Total',
    hue='LGA_Name',
    marker='o',
    linewidth=2
)

# Titles and labels
plt.title('Enrolment Trends in Top 15 LGAs (2020-2024)', fontsize=16)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Total Enrolment', fontsize=12)
plt.xticks(filtered_lga['Year'].unique())
plt.grid(True, linestyle='--', alpha=0.5)

# Format y-axis numbers with commas
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{int(x):,}"))

# Clean legend
plt.legend(title='LGA Name', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()
No description has been provided for this image

Wyndham has shown the strongest and most consistent growth. Other LGAs such as Greater Dandenong and Whitehorse also show steady increases, while Boroondara and Brimbank have declined slightly. Tracking enrolment at the LGA level helps local councils and school planners plan for infrastructure, classroom space, and services. Understanding where enrolments are rising fastest is important for preparing schools to meet growing demand in those areas.

Predictive Modeling for Enrolment TrendsΒΆ

In this section, we will build a predictive model to forecast future enrolment trends for schools in Melbourne. We will use historical enrolment data to train a machine learning model that can predict enrolments for the year 2025.

To predict future enrolment trends, we trained a machine learning model using past data from schools. We included features like the school type, sector, and last year's enrolment numbers to improve accuracy. After testing the model, we used it to estimate enrolments for the year 2025. This helped us identify the top 10 schools that are likely to have the highest number of students next year. The model showed good performance and gave useful insights for planning ahead.

InΒ [10]:
# Select relevant columns
df = merged_df[['School_No', 'Year', 'Grand_Total', 'Education_Sector', 'School_Type']].dropna()

# Encode categorical variables
le_sector = LabelEncoder()
le_type = LabelEncoder()
df['Education_Sector'] = le_sector.fit_transform(df['Education_Sector'])
df['School_Type'] = le_type.fit_transform(df['School_Type'])

# Create lag feature for previous year's enrolment per school
df = df.sort_values(by=['School_No', 'Year'])
df['Prev_Year_Enrolment'] = df.groupby('School_No')['Grand_Total'].shift(1)
df = df.dropna()  # Remove first-year rows without lag

# Features & target
X = df[['Year', 'Education_Sector', 'School_Type', 'Prev_Year_Enrolment']]
y = df['Grand_Total']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestRegressor(n_estimators=200, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.2f}")
print(f"RΒ²: {r2:.2f}")
RMSE: 72.11
RΒ²: 0.97
InΒ [11]:
# Predict 2025 enrolments
future_df = df[df['Year'] == 2024].copy()

# Use 2024 enrolment as lag for 2025
future_df['Prev_Year_Enrolment'] = future_df['Grand_Total']
future_df['Year'] = 2025

# Prepare correct features for prediction
future_X = future_df[['Year', 'Education_Sector', 'School_Type', 'Prev_Year_Enrolment']]
future_df['Predicted_Enrolment_2025'] = model.predict(future_X)

# Select top 10 schools by predicted enrolment
top_schools = future_df.groupby('School_No')['Predicted_Enrolment_2025'].sum().nlargest(10).index
plot_df = future_df[future_df['School_No'].isin(top_schools)]
InΒ [22]:
# Plot historical vs predicted enrolments
plt.figure(figsize=(12, 6))
for school in plot_df['School_No'].unique():
    # Historical data for this school
    temp_hist = df[df['School_No'] == school]
    plt.plot(
        temp_hist['Year'], 
        temp_hist['Grand_Total'], 
        label=f"School {school} - Actual"
    )

    # Add predicted point for 2025 for this school
    temp_pred = plot_df[plot_df['School_No'] == school]
    plt.scatter(
        temp_pred['Year'], 
        temp_pred['Predicted_Enrolment_2025'], 
        marker='x', 
        s=80, 
        color='red'
    )

plt.title("Predicted 2025 Enrolments for Top Melbourne Schools")
plt.xlabel("Year")
plt.ylabel("Enrolments")
plt.grid(True, linestyle="--", alpha=0.5)
plt.legend()
plt.tight_layout()
plt.show()
No description has been provided for this image

Each school’s enrolment is shown from 2020 to 2024, with a red β€œX” showing its predicted value for 2025. The graph highlights which schools are expected to grow significantly. These predictions help school administrators and planners get ready for future student numbers, making sure facilities and resources can support upcoming changes in school populations.

Scenario-Based ForecastingΒΆ

In this section, we will explore scenario-based forecasting for school enrolments in Melbourne. By considering different growth scenarios, we can better understand potential future trends and prepare for various outcomes.

We explore three possible futures for Melbourne school enrolments from 2025 to 2030:

  • Best Case assumes steady population growth or policy changes that boost enrolments by 3% each year.

  • Base Case reflects a moderate and realistic trend, where enrolments increase by about 1% yearly, continuing the current pace.

  • Worst Case models a possible decline, where enrolments drop by 2% annually, due to factors like demographic changes or shifts in school preferences.

These scenarios help planners prepare for different possibilities. The line connecting 2024 to 2025 shows the model’s prediction before applying the yearly growth/decline rates for each scenario.

InΒ [24]:
# Filter Melbourne schools data only
melbourne_df = df.copy()

# Historical enrolments (2021-2024)
historical = melbourne_df.groupby('Year')['Grand_Total'].sum().loc[2021:2024]

# Prepare base data for 2025
future_base = melbourne_df[melbourne_df['Year'] == 2024].copy()
future_base['Year'] = 2025
future_base['Prev_Year_Enrolment'] = melbourne_df[melbourne_df['Year'] == 2024]['Grand_Total']
future_base['Predicted_Enrolment'] = model.predict(
    future_base[['Year', 'Education_Sector', 'School_Type', 'Prev_Year_Enrolment']]
)

# Define scenarios for 2025-2030
years = np.arange(2025, 2031)
scenarios = {'Best Case': 1.03, 'Base Case': 1.01, 'Worst Case': 0.98}
scenario_results = {}

# Calculate forecasted enrolments per scenario
for name, rate in scenarios.items():
    enrolments = []
    current = future_base['Predicted_Enrolment'].sum()
    for _ in years:
        enrolments.append(current)
        current *= rate
    scenario_results[name] = enrolments

# Combine historical + forecasts for smooth plotting
plt.figure(figsize=(14, 7))

# Plot historical enrolments (solid black line)
plt.plot(historical.index, historical.values, color='black', linewidth=2.5, marker='o', label='Historical Enrolments')

# Connect 2024 actual value to 2025 base prediction
base_2025 = future_base['Predicted_Enrolment'].sum()
plt.plot([2024, 2025], [historical.loc[2024], base_2025], color='gray', linestyle='--', linewidth=2, label="Transition to Prediction")

# Plot scenario-based forecasts (dashed lines)
for scenario, values in scenario_results.items():
    plt.plot(years, values, linestyle='--', linewidth=2, marker='o', label=f"{scenario} Forecast")

# Add labels and styles
plt.title("Melbourne Schools Enrolment: Historical Trends & Scenario-Based Forecasts (2021-2030)", fontsize=15)
plt.xlabel("Year", fontsize=12)
plt.ylabel("Total Enrolments", fontsize=12)
plt.legend()
plt.grid(True, linestyle="--", alpha=0.5)
plt.tight_layout()
plt.show()
No description has been provided for this image

It shows three possible scenarios: Best Case (↑3% growth), Base Case (↑1% growth), and Worst Case (↓2% decline). The dashed lines show how enrolments could change depending on different conditions. This helps schools and government prepare for multiple futures-whether enrolments grow due to population increases or drop due to other factors. Scenario planning helps guide better long-term decisions around infrastructure, budgeting, and teacher needs by offering a range of likely outcomes.

ConclusionΒΆ

In this notebook, we explored and analysed school enrolment data in Victoria from 2020 to 2024, combining it with school location data to extract useful insights. We used a mix of charts, maps, forecasts, and clustering techniques to understand how enrolments change over time across different sectors (government, Catholic, independent), school types (primary, secondary, combined), and locations (LGAs).

We discovered several key insights:

  • Overall Growth: Enrolments have been steadily increasing, especially in government primary schools across growing LGAs such as Wyndham and Casey.

  • Uneven Distribution: Some LGAs showed declining or fluctuating enrolments, while others are experiencing rapid growth, highlighting the need for targeted planning.

  • Future Forecasts: We used historical data to predict enrolments for 2025 and created best, base, and worst-case scenarios up to 2030. This shows that depending on conditions, total enrolments could vary widely, making proactive planning essential.

  • Top Schools: We also identified the top 10 schools by predicted enrolment, helping highlight where infrastructure or staffing may need attention.

This analysis can help governments, education departments, and planners make smart decisions about where to build new schools or allocate more resources. It also helps parents, the public, and stakeholders easily understand school trends through visual tools and scenario-based forecasting.